Picture for Fengyun Rao

Fengyun Rao

SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback

Add code
Feb 05, 2026
Viaarxiv icon

ObjEmbed: Towards Universal Multimodal Object Embeddings

Add code
Feb 03, 2026
Viaarxiv icon

MMhops-R1: Multimodal Multi-hop Reasoning

Add code
Dec 16, 2025
Figure 1 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 2 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 3 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 4 for MMhops-R1: Multimodal Multi-hop Reasoning
Viaarxiv icon

WeDetect: Fast Open-Vocabulary Object Detection as Retrieval

Add code
Dec 13, 2025
Viaarxiv icon

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

Add code
Sep 26, 2025
Figure 1 for WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Figure 2 for WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Figure 3 for WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Figure 4 for WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Viaarxiv icon

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

Add code
Jul 30, 2025
Viaarxiv icon

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Add code
Jun 09, 2025
Viaarxiv icon

Instruction-augmented Multimodal Alignment for Image-Text and Element Matching

Add code
Apr 16, 2025
Viaarxiv icon

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment

Add code
Mar 26, 2025
Viaarxiv icon

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs

Add code
Mar 26, 2025
Figure 1 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 2 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 3 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 4 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Viaarxiv icon